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We investigate the distribution of the depth of a node contain¬ 
ing a specific key or, equivalently, the number of steps needed to 
retrieve an item stored in a randomly grown binary search tree. Us¬ 
ing a representation in terms of mixed and compounded standard 
distributions, we derive approximations by Poisson and mixed Pois¬ 
son distributions; these lead to asymptotic normality results. We are 
particularly interested in the influence of the key value on the distri¬ 
bution of the node depth. Methodologically our message is that the 
explicit representation may provide additional insight if compared to 
the standard approach that is based on the recursive structure of the 
trees. Further, in order to exhibit the influence of the key on the dis¬ 
tributional asymptotics, a suitable choice of distance of probability 
distributions is important. Our results are also applicable in connec¬ 
tion with the number of recursions needed in Hoare’s [Comm. ACM 
4 (1961) 321-322] selection algorithm Find. 


1. Introduction. The classical algorithm for storing data sequentially 
into a binary search tree proceeds as follows: The first item is put into 
the root node; subsequent elements are compared to the existing nodes, 
starting with the root, moving to the left if smaller than and to the right if 
greater than the content of the node until an external node is found. If there 
are n distinct (and comparable) values, then we obtain a random binary 
tree if we assume that all permutations of the data are equally likely. This 
data structure and its properties are discussed in the standard texts of the 
area; see, for example, Knuth (1973), Cormen, Leiserson and Rivest (1990) 
and Sedgewick and Flajolet (1996). Mahmoud (1992) gives a book-length 
treatment of random search trees. 
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Suppose now that a binary search tree is associated with a random per¬ 
mutation of the set {1,2,..., n} in the above manner. One of the quantities 
of interest in this structure is the depth of the node containing I, that is, 
its distance from the root; 1 -|- Xn,i is the number of steps needed to retrieve 
the value I (“successful search”). Arora and Dent (1969), in an early paper 
on the subject, obtained a simple and explicit formula for the corresponding 
expectation. 


(AD) E{l+Xn,l) = Hi + Hn+l-l-l, 

where Hj- := 1/b k are the harmonic numbers. This result implies 

that the average number of steps needed grows logarithmically only. It is 
easily seen, however, that X^^i can be as large as n — 1, which motivates a 
closer analysis of its distribution. 

In contrast to many other characteristics of the tree such as its height or 
total path length, the depth depends on two parameters, the size n of the 
base set and the key value (or label) I of the node, which complicates the 
analysis. Averaging the distributions over the second parameter avoids this 
problem; the result can be interpreted as the distribution of the depth of a 
key or node selected uniformly at random from the available range {1,..., n}. 
Louchard (1987) obtained a corresponding asymptotic normality result; see 
also Section 2.5 in Mahmoud (1992). The distance of two randomly selected 
nodes has recently been investigated by Mahmoud and Neininger (2003). 
Averaging leads to a loss of information, though. For example, it is imme¬ 
diate from (AD) that 


lim 

n—^■CO 


EXn,l 

logn 


= 1 , 


lim 
n—^oo 


logn 


= 2 , 


that is, the depth of the node with the smallest key is only about half of 
that of the node with the median key value on average, if the size of the 
base set is large. 

Our intention here is to obtain distributional approximations and asymp¬ 
totics for Xn^i that are sufficiently precise to show the dependence of the 
depth of a node on its key. The main tool is a distributional representation 
of Xn^i in terms of mixed and compounded distributions from well-known 
families (Theorem 1). In contrast to many investigations in this area we 
do not base our analysis on a recursion for the quantities of interest, but 
exploit the relationship to records which seems to have been noticed first 
by Devroye (1988). Devroye used this connection to investigate the depth of 
the last node; he wrote that it “allows us to obtain ... hopefully insightful 
proofs ....” The representation can also be used to obtain the expectation of 
Xn^i and therefore leads to an alternative proof for Arora and Dent’s (1969) 
formula. Somewhat to our surprise, asymptotic normality in the sense that 
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{^n,in ~ )/ \/E^n,in converges in distribution to a standard normal 

variable holds for every sequence {ln)n£N- This result has also been obtained 
by Devroye and Neininger (2004). It implies Louchard’s (1987) result for 
randomly selected nodes, but it can also be used to see the influence of 
the key on the node depth on the level that is apparent from the conse¬ 
quence of (AD) mentioned above: If the key value In varies with n such that 
logln/logn^tG [ 0 , 1 ], then {Xn,i^ - (1 -h t) logn)/\/(l -h t) logn is asymp¬ 
totically standard normal. However, if In/n —> t as n —> oo, then the ap¬ 
proximating normal distribution does not depend on t, as long as 0 < t < 1 . 
Hence, with this level of detail only extreme values of the key will have a 
noticeable influence on the depth distribution. 

The proof of asymptotic normality is based on a Poisson approximation 
result (Theorem 3), where we use total variation distance. If we replace 
the total variation distance by an appropriate Wasserstein metric, then a 
mixed Poisson approximation is needed since with this metric shifts are 
not swamped by the fact that EXn^i^ —> oo as n —> oo. Indeed, the mix¬ 
ing distribution will asymptotically be close to a shifted and reflected ex¬ 
ponential distribution, with shift 2 log re -|- 27 -|- log(t(l — t)) depending on 
t := lim„_>oo In/n (Theorem 6 ; 7 denotes Euler’s constant). 

These results are given in the next section. In the final section we discuss 
various consequences of our results and also relate these to the number of 
recursions needed by Hoare’s (1961) selection algorithm Find. 

We write C{X) for the distribution of the random variable X, with X 
Y abbreviating C{X) = £(y), and 1a for the indicator function of the set 
A. Instead of C{X) = fi, with some probability distribution /r, we also write 

A ~/i. Distributional convergence is denoted by and A(0,1) is the 
standard normal distribution, so that Xn Z, Z ^ X{0, 1) is short for 

lim PiXn <x) = $(x) := _ / e~^ dy for all x G R. 

J—00 

2. Results. Our first result displays the distribution of Xn^i in terms of 
mixed and compounded standard distributions from the Bernoulli, uniform 
and hypergeometric families. The representation becomes transparent once 
we consider an example. Suppose we have re = 20 and / = 11 . A particular 
permutation is given in the first line of Table 1. 

In the second line of Table 1 the part of the permutation to the left of the 
element of interest is divided into those that are greater (-I-) or smaller (—) 
than this element. The third line marks the descending (j) and ascending 
(I) records in these sublists, where the zth element Xi of a list {xi,... ,Xn) 
of numbers is a descending record if Xi = mini<j<jXj, ascending if Xi = 
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Figure 1 shows the search tree corresponding to the data in Table 1. The 
crucial point to note is that the path from the root of the tree to the element 
of interest passes through the descending records in the “+”-list, moving to 
the left, and the ascending records in the “—’’-list, moving to the right. 

We recall the definition of some standard distributions: X is said to have 
a Bernoulli distribution with parameter p if P{X = 1) = 1 — P{X = 0) =p, 
to be uniformly distributed on the (finite) set S if P[X = s) = 1/|S'| for all 
s € S, and to have a hyper geometric distribution with parameters N, M 


Table 1 

A permutation and its subrecord structure for n = 20, 1 = 11 


TT 18 1 5 6 10 20 3 13 9 17 7 12 8 16 14 19 2 11 4 15 

>,< + — — — — + — + — + — + — + + + — * 

records iTTTT i i * 
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and n if 


P{X = k) 


(M\ (N-M\ 
\k)\ n-k ) 



for A; = 0,..., n. 


We abbreviate these to X ~ Ber(p), X ~ unif(S') and X ~ HypGeo(A^; M, n), 
respectively. By a random permutation of a finite set S we always mean a 
permutation that is uniformly distributed on the |5|! possible values. 


Theorem 1. Suppose that N, Gi^i ,..., Gn,i,Ki,K 2 ,K ^,..., K'l, K 2 , K'^,.. 
are independent random variables with N ~ unif({l,..., n}), Gm,i ~ HypGeo( 
1;/ — 1,m — 1) for m = 1,... ,n and Ki^K[ ~ Ber(l/i) for all i e N. Then 


2 = 1 


2=1 


Proof. We first formalize the construction that we outlined above with 
the help of an example. Remember that n and I are given. Let tt be a random 
permutation of {1,... ,n} and let N := tt~^{1) be the position of 1. Further, 
let 


S- ■.= {l<i <N:TT{i) <l} = {ii,. .. ,^ 0 }, 

S+ ■.= {l<i <N:Tr{i) > 1} = {ji,... Jn-i-g}, 

with ii < ■ ■ ■ <ic and ji < ■ ■ ■ < Jn-i-g and let 

TT- := (7r(ii),...,7r(iG)), vr+ := {Tr{ji),... ,7r{jN-i-G)), 

G r-1 N-l-Gr-1 

=En ll-{7r(ifc)<7r(v)}) «+ = E n ^{7r(*fc)>7r(ir)}' 

r=l k=l r=l k=l 

With these constructions we have X^^i = R- +R+; see Section 13.4 in Gor- 
men, Leiserson and Rivest (1990) for a formal proof. R remains to ver¬ 
ify the distributional statements. For these, we simply recall some well- 
known or easily checked properties of records and random permutations; 
see, for example, Arnold, Balakrishnan and Nagaraja (1998): Obviously, 
N ~ unif({l,...,n}). Given N = m, (vr(l),...,7r(m — 1)) is a random per¬ 
mutation of the set {ir^i) :1 <i < m}. We can view 7r(i) as the result of the 
ith draw, without replacement, from an urn with n — 1 balls, I — 1 being 
“white,” meaning a result less than 1. Hence, conditionally on N = m, 

G := |S'_| ~ HypGeo(n — 1; / — 1, m — 1). 

Conditionally on = m and G = k, tt- and 7r_|_ are independent random 
permutations of {7r(R),.. . ,7r(4)} and {7r(ji ),... ,7r{jm-i-k)}, respectively. 
The distributional structure of records in random permutations is such that 
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the products in the definition of R- and which indicate the presence 
of a record at position r, are independent and Bernoulli distributed with 
parameter 1/r. 

The assertion of the theorem now follows on comparing the respective 
(conditional) distributions in the above decomposition to those in the con¬ 
structive representation. □ 


From the proof of the theorem it is evident that the first sum in the 
representation corresponds to the number of moves to the right on the path 
from the root to the node containing /; similarly, the second sum corresponds 
to the moves to the left. In this context it is interesting to note that 

~ unif({0, 1}), - 1 - Gn,i ~ unif({0,..., n - 1}). 

To see this, we simply calculate 

1 

P{Gn,i = k) = -Y. PiGm,i = k) 
n , 

m=l 

n d-lW n-l \ 

_ ^ V k 

n ^ 

m=l \m—lJ 

^ n-1 (m\fn—l—m\ 

_ £ Vfc A l-l-k J 



for fe = 0,..., ^ — 1, 


using HypGeo(n — 1; ? — 1, m — 1) = HypGeo(n — 1; m — 1, / — 1) and one of 
the basic identities for binomial coefficients given, for example, as equation 
(5.26) in Graham, Knuth and Patashnik (1989). The statement on — 1 — 
Gm,i follows from similar calculations or from symmetry considerations (see 
also Section 3). 

Note, however, that Gn,i and A^ — 1 — Gn,i are not independent; their 
joint distribution, which will be used repeatedly below, is given by 

(JD) P{Gn,i = i, at - 1 - Gn,i = j) = - ^ . 

n 


For our first approximation result we require the following bound for the 
variance of H[G) + H{N — 1 — G), where we have written H{G) instead of 
Hq. As usual, we put i^(0) = = 


Lemma 2. Let G and N be random variables with joint distribution 
given by (JD). Then 


vav{H{G) + H{N - I - G)) < 28. 
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Proof. Because of var(X + y) < var(X) + var(y) + 2var(X)^/^ var(y)^/^, 
it is enough to bound the variance of H(G) and H(N — 1 — G) by 7. The 
remarks following Theorem 1 imply that both can (individually) be rep¬ 
resented in distribution as H{[kU\) with U ~ unif(0,l) and k = I and 
k = n — 1 — I, respectively. We may assume that k > 1, and then, using 
Minkowski’s inequality, 

vaT{H{[kU\ )) < E{H{[kU\ ) - log k f 

= E{{H{lkU\) -logk)l{u<i/k}f 
+ E{{H{lkU\) -logk)l{u>i/k}f 

< 

with 


Vk := {H{[kU\) -log{[kU\))l{u>i/k}, 
I kU\ 

■= log — 7 —• 


The first term is bounded by 4e for I 4 we use that \Hj 
j G N. Finally, 


logj| < 1 for all 


EWi 


1 k—1 

4s 


t=i 



< / (logx)^dx = 2, 
Jo 


which gives var(F7([/cC/J)) < 4e ^ + (1-|--v/2)^ < 7. □ 


Our first result shows that the distribution of can be approximated 
by a Poisson distribution with the same mean; it comes with an explicit error 
bound. Recall that the total variation distance of two probability measures 
/r and u concentrated on Nq is given by 

CXD 

(iTv(Ai,4= sup \n{A)-iy{A)\ = J'^\fi{{k})-u{{k})\. 

AcNo k=o 

Further, for a probability measure v concentrated on the nonnegative half 
line [0, 00 ) we write MixPo(i^) for the mixed Poisson distribution with mixing 
measure i', that is, 

MixPo(i^)({A:}) = J e~^^v{d\) for all k G Nq. 

With u = 5x, the one-point measure on A > 0, we obtain the usual Poisson 
distribution Po(A). This also holds for A = 0 as we interpret Po(0) as 5q. 
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Theorem 3. With the above notation, 


sup dTY{C{Xn,l),Po{EXn,l)) < ^ 

ie{i,...,n} logn 


for all n>2. 


Proof. We first give a conditional approximation by a Poisson distri¬ 
bution which leads to an approximation by a mixed Poisson distribution. 
The latter will then be approximated by a Poisson distribution with the 
same mean. 

We use the following fundamental Poisson approximation result: If Xi ,..., Xn 
are independent with Xi ~ Ber(pj), then 



< 




2=1 




see, for example, page 8 in Barbour, Holst and Janson (1992). Together 
with the representation in Theorem 1 this immediately implies the following 
bound for the Poisson approximation of the conditional distributions: 


dTy{C{Xr,,l\G = i,N-l-G = j),Po{Hi + H,)) 



3Hi^j 


for i + j > O] for i = j = 0 the distance is 0. Note that i + j corresponds to 
— 1, which is uniformly distributed on {0,..., n — 1}. The unconditioning 
step therefore leads to 

vr^ 1 1 

«iTv('C(X„;),MixPo< --^ , 

where fin,i denotes the distribution of H{G) + H{N — \ — G). Standard 
elementary arguments show that < 3n/logn for n > 2. 

A mixed Poisson distribution can be approximated by an ordinary Poisson 
distribution with the same mean. Using total variation distance we have, 
according to Theorem l.C(ii) in Barbour, Holst and Janson (1992), 

2 

dTv(MixPo(/i„,z),Po(p;X„_i)) < y , 

with cr^ the variance associated with ^^, 1 - Here we have used that the ex¬ 
pectation associated with Hn,i is equal to EXn,i- An appeal to Lemma 2 and 
the triangle inequality now completes the proof. □ 
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We do not claim that the numerical values in the bound are tight; for us, 
the more important aspect is the fact that the bound does not depend on 
1. In particular, with (In)neN a sequence of integers with 1 < /„ < n for all 
n G N, but completely arbitrary otherwise, and Yn ~ Fo{EXn^i^), 


P 


^n,ln 

VEx;^ 





<dTy{C{Xn,lJ,C{Yn)) + 


P 


Yn-EYn^ \ 


<h(x) 


for all X G M, so the asymptotic normality of Poisson distributions with 
parameter tending to infinity and the bound in Theorem 3 together imply 
that 


Xn,l^-EX, 


n,ln distr 




Z, 


Z ~ -/V(0,1), 


as n —> oo. [In fact, combining this with the Berry-Esseen theorem we obtain 
the rate 0((logn)“^/^) for the Kolmogorov-Smirnov distance.] Special cases 
can be obtained on using Arora and Dent’s formula (AD). For example, if 


(SC) 


min{log(/„),log(n-/n)} ^ ^ 

n—>oo logn 


for some t G [0,1], then 

Xn,i„ - (l + t)log ^ distr 
V(1 + t)logn 


Z, .Z ~ A^(0,1). 


In particular, if In/n t G (0,1), then — 21ogn)/-v/21ogn is asymp¬ 

totically standard normal, irrespective of the value of t. 

Louchard (1987) showed that, with Un ~ unif({l,... ,n}) independent of 
the search trees, 

Xn,Un ~ 2 logn distr ,7 ,7 Attn 1 ^ 

V 2 log n 

This can now be derived from (SC) via the representation Un = [nt/] with 
U ~ unif(0,1) by conditioning on U = t G (0,1). (A conditioning argument 
can also be used to extend the bound in Theorem 3 to randomly chosen l- 
indices.) The special case also makes precise the intuitive picture that nodes 
with extreme keys, that is, with I being close to 1 or n, have lesser depth 
and will be found faster than those “within” the range from 1 to n. 

In order to see the influence of the key on the node depth in the midrange, 
by which we mean that In/n —> t for some t with 0 < t < 1, we have to use a 
different metric for probability distributions. This becomes obvious as soon 
we expand EXn^i„ up to constants, since in an asymptotic normality result 
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constant shifts do not matter asymptotically if the scaling factors tend to 
infinity. If we use the total variation distance, this even holds on the Poisson 
approximation level as 


lim dTv(Po(A + c),Po(A)) = 0 forallc>0. 


Our second result shows that with a suitable Wasserstein metric shifts do 
become visible. There are two consequences: We now need a mixed Poisson 
distribution as approximating measure, and we lose on the rate side. Fol¬ 
lowing Barbour, Holst and Janson (1992), we consider the distance dw for 
probability distributions /i, on (the Borel subsets of) the real line defined 
by 


u) := sup 


J fd^-J fdv 




sup \f{x)- f{y)\<l 

\x-y\<l 


For distributions concentrated on the nonnegative integers it can be shown 
that 

OO 

dw{l^,y) = ^\^i{[k,oo)) - u{[k,oo))\. 
k=0 

Hence, if X and Y are random variables with distributions // and v, respec¬ 
tively, then dw{fj,, v) > \EX — EY \, which in turn implies that Po(A -|- c) and 
Po(A) remain distinguishable under this distance if A —> oo, c> 0 fixed (we 
generally use dw only in connection with distributions with finite mean). 
Further, dw can be realized by a suitable coupling in the sense that 


dwifJ', k') = min{£'|X — Y\:X^fi,Y^i>}. 


The following lemma contains two properties of the Wasserstein distance; 
their proof makes use of the above alternative expressions for dw- When 
we use the first of these below we will speak of unconditioning; a similar 
property for the total variation distance has already been used in the proof 
of Theorem 3. The second property shows that // MixPo(^) is a weak 
dvE-contraction. 


Lemma 4. (a) If X with P{X G N) = 1 and Y are random variables such 

that 

dwmX\Y = y),Fo{(p{y))) <f{y) 
for all y, with measurable funetions (j) and f, then 

dvv(£(X),MixPo(£(0(y)))) < Ef{Y). 
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(b) For any two probability distributions fi, v on the nonnegative real line, 
(ivE(MixPo(/i),MixPo(z/)) < 

Proof, (a) We condition on the value of y; / • • • C{Y){dy) means that 
we integrate with respect to the distribution of Y: 

dw{C{X),MiKPo{C{ct){Y)))) 

oo « 

= Z! / = y)i[k,oo)) -Fo{(p{y)){[k,oo)))£{Y){dy) 

k=0 


< 


= y)i[k,oo)) - Fo{4>{y)){[k,oo))\C(Y){dy) 


k=0 


= / dwmX\Y = y),Poim))^(y)idy). 


(b) Let {Nt)t>o be a unit rate Poisson process and let X and Y be random 
variables, independent of the process, with X ^ fj,, Y ^ u and dwiti-,^) = 
E\X — Y\. Then Nx ~ MixPo(/x), Ny ~ MixPo(i^) so that by conditioning 
on X and Y and considering the cases X >Y and X <Y separately, 

dvy(MixPo(/i),MixPo(z/)) < E\Nx — Ny\ 

= E{E[\Nx - Ny\\X,Y]) 

= E\X -Y\=dw{F,^)- □ 

We also need an elementary estimate related to hyper geometric distribu¬ 
tions. 


Lemma 5. 


With X ~ HypGeo(A^; M, n), 


E 


log 




EX 


l{x>o} 


^ 4iVlogiV ^ / N 
~ nM V nM 


Proof. We use 


EX = 


nXI 


var(X) < 


nM 

~7r 


together with Chebyshev’s inequality, the bound logA^ for the integrand, 
the fact that | log(l -|- a:)| < 2\x\ on |a;| < 1/2, and E\X — EX\ < ^/vaT{X) to 
obtain 


E 


log 




EX 


l{x>o}]<ilogN)P[\X-EX\> 


EX 


+ 2E 


X 


EX 


- 1 


^ 4iVlogA^ 



nM 
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□ 

We can now state and prove our second approximation result for key 
values in the central range. 

Theorem 6. Suppose that In varies with n such that 

with some t G (0,1). Let 

Vn,t ■= T(( 21 ogn + 27 + log(t(l - t)) - 2 X)+), 
where X is exponentially distributed with mean 1. Then 

dw{^Xn,iJ,MixPo{vn,t)) =o(^-^^=y 

Proof. We continue to use the notation introduced in the proof of The¬ 
orem 3 and again begin by comparing conditional distributions to Poisson 
distributions. The basic result for the Wasserstein distance, obtained by 
combining Lemma 1.1.5 and Remark 1.1.7 in Barbour, Holst and Janson 
(1992), is the following: If Xi,...,X„ are independent with Xi ~ Ber(pj), 
then 

‘TibS'' 

In our situation we obtain with the representation in Theorem 1, abbrevi¬ 
ating GNr^lr, to Gn, 

dw{G{Xn,lJGn = i,Nn-l-Gn =j),Po{Hi + Hj)) 

< ^ ^ <r 

- + “ 3 /^' 

Unconditioning and using (logn)= 0(n), we see that 

dw{G{Xn,i),MixFo{nn)) = 

where fin ■= hl{H{Gn) + H{Xn — 1 — Gn))- Using the triangle inequality and 
Lemma 4(b) we see that it remains to show that dwifJ-n-, v'n,t) = 0((logn)“^/^) 
This will follow if we can find random variables Xn and Yn such that C{Xn) = 
fin, L.(Y+) = Vn,t and y/\ognE\Xn - Yn\ = 0(1). (Because of Xn > 0, going 
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from Yn to will not increase the Wasserstein distance to the distribution 
of Xn.) Let U ~ unif(0,1) and Nn := [nC/] for all n G N. With 

Xn:=H{Gn)+H{Nn-l-Gn), 

Yn := 21 ogn + 27 + log(t(l - t)) + 21 og[/, 


the distributional requirements are satisfied and we have 

6 

\Xn — Yn\ < ^ \Zi,n\ 
i=l 


with 


Zl,n H{Gn) — log(Gn)]l{c'^>0} “ 7) 

Z2,n ■■= log(Gn)l{G„>0} “ log(/n “ 1) “ log|^^^, 

Z3,n ■= log(/n - 1 ) - log(nt) + log|^i^^ - logU, 

Zi^n ■— H[Nn — 1 — Gn) ~ log(A^n ~ 1 ~ Gn)^{Nn — l—Gn>0} ~ 7) 

f N, 

Z5,n ■= log(A^n - 1 - Gn)t{Nr^-l-Gr,> 0 } “ log(n - 1 - - logf ^ 

Z6,n :=log(n- I-In) -log(n(l -t)) + log^^^ - logU. 

For the first of these we use the fact that, for some constant G <oo, 

(j 

\Hn — logn — 7I < — for all n G N, 
n 

and C{Gn) = unif({0 ,... ,ln — 1}) to obtain 


E\Zi^n\ < lP{Gn — 0) + —]1{G„>0}^ 

= 2 + - = n(^^^ 

In In \ n 


The second term is slightly more complicated as it involves both Gn and 
Nn- Conditioning on the latter we get 


E\Z2,n\<E{E[\Z2,n\\Nn]). 

On {Nn = 1} we have Gn = 0, which leads to 

^[|^2,n||iVn = l]=log(^^) =0(1). 
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Together with P{Nn = 1) = 1/n this gives £'|Z 2 ,n|l{iv„=i} = 0{l/n). We 
may therefore assume that Nn > 1 as long as we deal with Z 2 ^n- 
We use another decomposition, 


with 


1-^2,711 < ■^2,l,n + ■^2,2,n + ^2,3,^ 

Gr, 


Z2,l,n = 
Z2,2,n = 

^2,3,n = 


log 

log 

log 


E[Gn\N, 

Nn-1 


1{G„>0}! 

Nr,. 


, -log 
n — 1 n 


Nr^ilr^-l) 


n 


l{G„=o}- 


Lemma 5 yields 
Gn 


E 


log 


E[Gn\Nrr 


11{G„>0} 


Nr, 


^ 4(n-l)log(n-l) 


n — 1 


{Nrr-l){ln-l) 


on Nn > 1, which together with 


1 log(ra- 1) ^ ^ 


k 


1 


Vlogn 


1 


= 0 


n 


(^^Vk \^/\ogn)' 


gives EZ2,i,n'^{Nr,>i} = 0((logn) For Z 2 , 2 ,n we obtain 


EZ2,2,nt{Nr,>l} - “ 


k=2 


k-1 k 

log-- - log - 

n — 1 n 


^ n n 

^ - yi(logfe - log(A: - 1)) + - V(logn - log(n - 1)) 
^ ^2 ^ ^2 

1 , n — 1, n 

= — log n H-log-- 

n n n — 1 


_ ^/ logn 
\ n 

On {Nn > \/n} we have 

for some k < 1 and n large enough, hence 

EZ2,3,nt{Nr,>l} = EiE[Z2,3,n\Nn]l{i^N^<^}) 

+ E{E[Z2,3,n\Nn]l{]y.^y^j) 
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VV^\ 




n 


k=2 


log 


k{ln - 1 ) 


n 


+ - 
n 


.\/n 


fc=[v^J+l 


Both terms on the right-hand side are obviously 0((logn) so that this 
rate also holds for EZ2^'i,n'^{Nn>i} therefore for E\Z 2 ^n\ too. 

For Z^^n we use the rate condition on ^ — t together with the following 
argument which is based on the construction of Nn. 


E 


, Nn 
log — 
n 


logU 


Slog — 
n 


ElogU 


1 ” k 

= -Eiog- + i 

n n 

k=l 

= — login!) — logn + 1 
n 

_ ^/ logn 
\ n 

Finally, adapting the arguments used for Zi^n to i = 1,2,3, is a 

straightforward task. □ 


3. Miscellaneous comments. We relate our findings to another classical 
algorithm in Section 3.1. In Section 3.2 we discuss the expectation and the 
variance of /. The use of (and need for) other probability metrics, together 
with the relationship between total variation and Wasserstein distance, are 
briefly considered in Section 3.3. The final subsection deals with another 
noteworthy aspect of the representation of as the sum of the number 
of moves to the right and the number of moves to the left. 


3.1. A situation very similar to the one considered above arises in con¬ 
nection with Hoare’s (1961) selection algorithm Find, a randomized divide- 
and-conquer algorithm that selects the 1th smallest element of a totally 
ordered set S of size n in a recursive manner: First, an x from S is chosen 
uniformly at random. Comparing this element to all others, we obtain the 
subsets S- ■.= {y £ S \y < x} and £'+ := {y e 5: y > x}. We continue with 
{l,S) replaced by {l,S-) if the size k := |5_| is greater than or equal to I 
and with (I — 1 — k, 5+) k <l — 1. If k = I — 1, then we stop and return 
X. For the time required by the algorithm the number of comparisons Cn^i 
is most important, but the number Rn^i of recursions has also been inves¬ 
tigated. Instead of introducing randomness via the selection of the pivotal 
element, we can equivalently assume that the data are random, with all per¬ 
mutations being equally likely, that we operate on lists rather than sets and 
that we always choose the first element of the list as the pivot. This connects 
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Find to binary search trees, with S- and S+ corresponding to the left and 
right subtree, respectively, and indeed, it is well known that Rn,i is equal in 
distribution to Xn,i (or to 1 + Xn^i if we include the initial step). 

Again, details are given in many of the standard textbooks; see also the 
recent book by Mahmoud (2000). As with binary search trees, if interest is 
in the behavior of these quantities for n large, one can average out the 1. 
This leads to results on the number of comparisons and recursions needed 
for a randomly chosen 1; see, for example. Section 7.5 in Mahmoud (2000) 
and the references given there. Instead, Griibel and Rosier (1996) considered 
the whole function 1 1 —> Cn,i- The resulting limit theorem for the stochastic 
processes (C'„,pn])o<i<i implies the distributional convergence of Cn^i^jn if 
the sequence {ln)neN is such that In/n — > t as n — > oo for some t G [0,1]; the 
limit distribution depends on t. A different approach, leading to this result 
more easily, is given in Griibel (1998). The results in the previous section 
cover similar aspects for the number of recursions required. In particular, 
the terms Y^=i i^i Theorem 1 represent the number 

of times that the element of interest is put into S- and 5"+, respectively, 
in the course of the algorithm. It is interesting to note that, in contrast 
to the situation with the number of comparisons, we have concentration of 
mass for the number of recursions in the sense that Rn,i„/ERn,in converges 
to 1 in probability. An analogue to the result in Griibel and Rosier (1996) 
would be a functional limit theorem for the “depth plot” 1 1 —> Xn,i which, 
incidentally, characterizes the binary search tree. Figure 2 shows this plot 
for the permutation in Table 1. 
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3.2. The representation in Theorem 1 leads to an alternative proof of 
(AD). Let Y := Ki, Z := with the notation as in The¬ 

orem 1. Using Gtv,; ~ unif({0 ,... ,l — 1}), EKi = l/i and equation (6.67) in 
Graham, Knuth and Patashnik (1989), we obtain 


1 ^ 1 i 1 

j=0i=l * 


1 

i=o 


Together with a similar calculation for Z, this gives 


EXn^i — EY + EZ — Hi + — 2. 


The variance of ; is mentioned in Arora and Dent (1969); the explicit 
formula 


(KP) 


YSiT{Xn,l) = 


2 (n + l) 
/(n + 1 — /) 
- - H 


Hn+[I- 
( 2 ) 


n+1—/ 


+ 


2 (n + l) 
l{n + 1 — 1) 

^ +2 


{Hi + H, 


n+l —/) 


l{n -|- 1 — ^) 


with iL® := ELi is given in Kirschenhofer and Prodinger (1998). 

Obtaining this from our representation is a somewhat tedious task that boils 
down to an unsightly formula involving harmonic numbers and a multitude 
of binomial coefficients. In contrast to the situation with EX^^u this does 
not seem to lead to an intuitive or short proof, so we do not give the details. 


3.3. We have pointed out in Section 2 that the total variation distance 
will not distinguish between, say, Po(A) and Po(A -|- c) with c constant as 
A ^ oo, so we may have {C{Xn), C{Yn)) 0 even if EXn — EYn does not 

vanish asymptotically as n —> oo. For general distributions on the real line 
we may conversely have a small Wasserstein distance together with a large 
total variation distance, but for distributions concentrated on the integers 
the simple relation 

H{{k}) = n{[k, oo)) - +{[k + 1, oo)) 

implies that 

dTV {+,i^)< "^dw {+,1^)- 

Using dw instead of c^tV) we obtained an approximation that is asymptoti¬ 
cally correct with respect to first moments in the sense that lim^^oo dw{Y{Xn), C{Yn)) 
0 implies \min^ao{HXn — EYn) = 0. Prom (KP) and some straightforward 
calculations it follows that we would need yet another metric and a more 
detailed expansion to obtain an approximation that is asymptotically cor¬ 
rect for second moments too; see, for example, the metric used in Mahmoud 
and Neininger (2003). 
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[-1-1-1-1-J-1-1-1-1- 1 -1-1- 1 -1-1- 1 -1-1-1 

1 5 10 15 20 


Fig. 3. The scatterplot for the permutation in Table 1 (•: records in the subpermuta¬ 
tions). 


3.4. The simplification for the two constituent parts of that we used 
in Section 3.2 has the following noteworthy consequence: The distribution 
of ^i^ with the assumptions as in Theorem 1, is equal to that of 

Y^\=iKi — 1, which makes the random summation index disappear. With 
and Xff^i for the number of moves to the left and right, respectively, 
this means that 

/ I \ /n+l-l 

C{1 + X-i) = C\^kA, £(l + X-,)=£f ^ Ki 

with Ki, K 2 , ■ ■. independent and Ki Ber(l/i). Since Xn,i = XX + XX, 
this leads to another proof of (AD). 

A glance at Figure 3 explains the “distributional coincidence”: 1 + 
is the number of ascending records in the subpermutation of the I elements 
that are less than or equal to /, 1 + X))fi is the number of descending records 
in the subpermutation of the n +1 — / elements that are greater than or equal 
to 1. This leads to a very simple description of the node depth distribution 
in the extreme cases, 

C{l + Xn,l) = Cil + Xn,n)=c(^^^K,^, 

since for the minimum and maximum all steps are in one direction only. Note, 
however, that despite the independence of the subpermutations of the ele¬ 
ments that are strictly smaller, respectively larger, than I, X^^ and Xff^i are 
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not independent. Indeed, since C{X^i) = C{Xi^i) and C{X^i) = C{Xn+i-i^i)-, 
it is tempting to think of X^^i as the sum of Xij and i, but the 

simplest nontrivial case already provides a counterexample to the assump¬ 
tion that these can be taken to be independent: £(^ 32 ) = unif({ 0 , 1 , 2 }), 
£(X 2 ,i) = £(X 2 , 2 )=unif({ 0 ,l}). 

Acknowledgment. We thank the referee for drawing our attention to 
Devroye and Neininger (2004). 

REFERENCES 

Arnold, B. C., Balakrishnan, N. and Nagaraja, H. N. (1998). Records. Wiley, New 
York. MR1628157 

Arora, S. and Dent, W. (1969). Randomized binary search technique. Comm. ACM 12 
77-80. 

Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Clarendon 
Press, Oxford. MR1163825 

CORMEN, Th. H., Leiserson, Ch. E. and Rivest, R. L. (1990). Introduction to Algo¬ 
rithms. MIT Press. MR1066870 

Devroye, L. (1988). Applications of the theory of records in the study of random trees. 
Acta Inform. 26 123-130. MR969872 

Devroye, L. and Neininger, R. (2004). Distances and finger search in random binary 
search trees. SIAM J. Comput. 33 647-658. MR2066647 
Graham, R. L., Knuth, D. E. and Patashnik, O. (1989). Concrete Mathematics, 2nd 
ed. Addison-Wesley, Reading, MA. MR1001562 
Grubel, R. (1998). Hoare’s selection algorithm: A Markov chain approach. J. Appl. 
Probab. 35 36-45. MR1622443 

Grubel, R. and Rosler, U. (1996). Asymptotic distribution theory for Hoare’s selection 
algorithm. Adv. in Appl. Probab. 28 252-269. MR1372338 
Hoare, C. a. R. (1961). Algorithm 63: Partition, Algorithm 64: Quicksort, Algorithm 
65: Find. Comm. ACM 4 321-322. 

Kirschenhofer, P. and Prodinger, H. (1998). Gomparisons in Hoare’s Find algorithm. 

Combin. Probab. Comput. 7 111-120. MR1611049 
Knuth, D. E. (1973). Sorting and Searching. The Art of Computer Programming 3. 
Addison-Wesley, Reading, MA. MR445948 

Bouchard, G. (1987). Exact and asymptotic distributions in digital binary search trees. 
Theor. Inform. Appl. 21 479-496. MR928772 

Mahmoud, H. M. (1992). Evolution of Random Search Trees. Wiley, New York. 
MRl 140708 

Mahmoud, H. M. (2000). Sorting-. A Distribution Theory. Wiley, New York. MR1784633 
Mahmoud, H. M. and Neininger, R. (2003). Distribution of distances in random binary 
search trees. Ann. Appl. Probab. 13 253-276. MR1951999 
Sedgewick, R. and Flajolet, Ph. (1996). An Introduction to the Analysis of Algorithms. 
Addison-Wesley, Reading, MA. 


Institut fur Mathematische Stochastik 

Universitat Hannover 

POSTFACH 60 09 

D-30060 Hannover 

Germany 

E-MAIL: rgrubel@stochastik.uni-hannover.de 


